A B C D E F G H I J K L M N O P Q R S T U V W X Y Z All
Sathiyabama, S.
- Bisecting K-Means Clustering Approach for High Dimensional Dataset
Authors
1 Bharathiar University, Coimbatore. Tamilnadu, IN
2 MCA Department, with the K.S. Rangasamy College of Technology, Tiruchengode, Tamil Nadu, IN
Source
Data Mining and Knowledge Engineering, Vol 3, No 2 (2011), Pagination: 138-141Abstract
High dimensional data is phenomenon in real-world data mining applications. Developing effective clustering methods for high dimensional dataset is a challenging problem due to the curse of dimensionality. Usually k-means clustering algorithm is used but it results in time consuming, computationally expensive and the quality of the resulting clusters depends on the selection of initial centroid and the dimension of the data. The accuracy of the resultant value perhaps not up to the level of expectation when the dimension of the dataset is high because we cannot say that the dataset chosen are free from noisy and flawless. Hence to improve the efficiency and accuracy of mining task on high dimensional data, the data must be pre-processed by an efficient dimensionality reduction method. This paper proposes a method in which the high dimensional data is reduced through Principal Component Analysis and then bisecting k-means clustering is performed on the reduced data where there is no initialization of the centroids.Keywords
Bisecting K-Means, Dimensionality Reduction, K-Means, Principal Component Analysis, Principal Components.- Asynchronous Periodic Pattern Mining for Cyclic and Incremental Sequential Time Stamp
Authors
1 M. Kumarasamy College of Engineering, Karur, IN
2 Selvam College of Technology, Namakkal, IN
3 K. S. Rangasamy College of Technology, Tiruchengode, IN
Source
Data Mining and Knowledge Engineering, Vol 2, No 3 (2010), Pagination: 33-36Abstract
Mining of periodic patterns in time-series databases is an interesting data mining problem. It can be envisioned as a tool for forecasting and prediction of the future behavior of time-series data. Most researches focused on mining synchronous periodic patterns, but in practice some periodic patterns can not be recognized because of presence of random noisy and disturbance in large datasets. To increase the efficiency of mining Asynchronous periodic patterns on large datasets, our proposal work move in the direction of finding all maximal complex patterns in a single step algorithm using a single dataset scan without mining single event and multiple events patterns explicitly. The asynchronous periodic patterns are mined using depth first search technique. Three parameters are employed to specify the minimum number of repetitions required for a valid segment of non disrupted pattern occurrences, the maximum allowed disturbance between two successive valid segments, and the total repetitions required for a valid sequence.
To find multiple periods based partial periodic patterns, looping over single period using the hit set based approach is one of the naive methods. The algorithm is to directly apply the max-pattern hit set method to each period and the sequence. The proposal evaluates another method for multiple periods using shared mining. This method is similar to the max-pattern single period mining algorithm. During the first scan of the sequence for all periods, the frequent pattern and candidate max-pattern are generated. During the second scan the hit sets of all the periods are generated as in the second scan of max-pattern hit set method. The problem of max-sub pattern tree construction and derivation of frequent patterns from the max-sub pattern tree is also discussed.
The proposal of our work integrates the cyclic partial periodic patterns and incremental patterns with adaptive thresholds to produce interactive mining of partial periodic patterns. The maximum pattern search space and the execution overhead for the proposal work are analyzed with synthetic and real data sets from UCI Repository. In addition it work in the direction of merge mining a generalization of incremental mining which discover patterns of two or more databases that are mined independently of each other. An improved version of merge mining algorithm is planned to build for asynchronous partial periodic patterns in time-series databases.